智能论文笔记

A Retrospective on ICSE 2022

Cailin Winston , Caleb Winston , Chloe Winston , Claris Winston , Cleah Winston

分类：机器学习

2022-07-26

第44届软件工程国际会议（ICSE 2022）于2022年5月22日至2022年5月27日在美国宾夕法尼亚州匹兹堡亲自举行。在这里，我们总结了我们在会议上观察到的软件工程和测试领域的研究主题以及研究方向。

translated by 谷歌翻译

Free-form 3D Scene Inpainting with Dual-stream GAN

Ru-Fen Jheng , Tsung-Han Wu , Jia-Fong Yeh , Winston H. Hsu

分类：计算机视觉

2022-12-16

Nowadays, the need for user editing in a 3D scene has rapidly increased due to the development of AR and VR technology. However, the existing 3D scene completion task (and datasets) cannot suit the need because the missing regions in scenes are generated by the sensor limitation or object occlusion. Thus, we present a novel task named free-form 3D scene inpainting. Unlike scenes in previous 3D completion datasets preserving most of the main structures and hints of detailed shapes around missing regions, the proposed inpainting dataset, FF-Matterport, contains large and diverse missing regions formed by our free-form 3D mask generation algorithm that can mimic human drawing trajectories in 3D space. Moreover, prior 3D completion methods cannot perform well on this challenging yet practical task, simply interpolating nearby geometry and color context. Thus, a tailored dual-stream GAN method is proposed. First, our dual-stream generator, fusing both geometry and color information, produces distinct semantic boundaries and solves the interpolation issue. To further enhance the details, our lightweight dual-stream discriminator regularizes the geometry and color edges of the predicted scenes to be realistic and sharp. We conducted experiments with the proposed FF-Matterport dataset. Qualitative and quantitative results validate the superiority of our approach over existing scene completion methods and the efficacy of all proposed components.

translated by 谷歌翻译

Unimodal and Multimodal Representation Training for Relation Extraction

Ciaran Cooney , Rachel Heyburn , Liam Maddigan , Mairead O'Cuinn , Chloe Thompson , Joana Cavadas

分类：自然语言处理

2022-11-11

Multimodal integration of text, layout and visual information has achieved SOTA results in visually rich document understanding (VrDU) tasks, including relation extraction (RE). However, despite its importance, evaluation of the relative predictive capacity of these modalities is less prevalent. Here, we demonstrate the value of shared representations for RE tasks by conducting experiments in which each data type is iteratively excluded during training. In addition, text and layout data are evaluated in isolation. While a bimodal text and layout approach performs best (F1=0.684), we show that text is the most important single predictor of entity relations. Additionally, layout geometry is highly predictive and may even be a feasible unimodal approach. Despite being less effective, we highlight circumstances where visual information can bolster performance. In total, our results demonstrate the efficacy of training joint representations for RE.

translated by 谷歌翻译

Late Fusion with Triplet Margin Objective for Multimodal Ideology Prediction and Analysis

Changyuan Qiu , Winston Wu , Xinliang Frederick Zhang , Lu Wang

分类：自然语言处理

2022-11-04

Prior work on ideology prediction has largely focused on single modalities, i.e., text or images. In this work, we introduce the task of multimodal ideology prediction, where a model predicts binary or five-point scale ideological leanings, given a text-image pair with political content. We first collect five new large-scale datasets with English documents and images along with their ideological leanings, covering news articles from a wide range of US mainstream media and social media posts from Reddit and Twitter. We conduct in-depth analyses of news articles and reveal differences in image content and usage across the political spectrum. Furthermore, we perform extensive experiments and ablation studies, demonstrating the effectiveness of targeted pretraining objectives on different model components. Our best-performing model, a late-fusion architecture pretrained with a triplet objective over multimodal content, outperforms the state-of-the-art text-only model by almost 4% and a strong multimodal baseline with no pretraining by over 3%.

translated by 谷歌翻译

CrossDTR: Cross-view and Depth-guided Transformers for 3D Object Detection

Ching-Yu Tseng , Yi-Rong Chen , Hsin-Ying Lee , Tsung-Han Wu , Wen-Chin Chen , Winston Hsu

分类：计算机视觉 | 机器人

2022-09-27

为了以低成本的自动驾驶成本实现准确的3D对象检测，已经提出了许多多摄像机方法并解决了单眼方法的闭塞问题。但是，由于缺乏准确的估计深度，现有的多摄像机方法通常会沿着深度方向产生多个边界框，例如行人等困难的小物体，从而产生极低的召回。此外，将深度预测模块直接应用于通常由大型网络体系结构组成的现有多摄像机方法，无法满足自动驾驶应用程序的实时要求。为了解决这些问题，我们提出了3D对象检测的跨视图和深度引导的变压器，CrossDTR。首先，我们的轻质深度预测器旨在生成精确的对象稀疏深度图和低维深度嵌入，而在监督过程中，无需额外的深度数据集。其次，开发了一个跨视图引导的变压器，以融合深度嵌入以及来自不同视图的相机的图像特征并生成3D边界框。广泛的实验表明，我们的方法在行人检测中大大超过了10％，总体图和NDS指标中约为3％。同样，计算分析表明，我们的方法比以前的方法快5倍。我们的代码将在https://github.com/sty61010/crossdtr上公开提供。

translated by 谷歌翻译

Orbeez-SLAM: A Real-time Monocular Visual SLAM with ORB Features and NeRF-realized Mapping

Chi-Ming Chung , Yang-Che Tseng , Ya-Ching Hsu , Xiang-Qian Shi , Yun-Hung Hua , Jia-Fong Yeh , Wen-Chin Chen , Yi-Ting Chen , Winston H. Hsu

分类：机器人 | 计算机视觉

2022-09-27

高度期望可以通过视觉信号执行复杂任务并与人合作执行复杂任务的空间AI。为了实现这一目标，我们需要一个视觉大满贯，该猛击很容易适应新场景而无需预训练，并为实时的下游任务生成密集的地图。由于其组件的固有局限性，先前基于学习和非学习的视觉大满贯都不满足所有需求。在这项工作中，我们开发了一个名为Orbeez-Slam的视觉猛烈抨击，该作品成功地与隐式神经表示（NERF）和视觉探测仪合作以实现我们的目标。此外，Orbeez-Slam可以与单眼相机一起使用，因为它只需要RGB输入，从而广泛适用于现实世界。我们验证其对各种具有挑战性的基准的有效性。结果表明，我们的大满贯速度比强大的渲染结果快800倍。

translated by 谷歌翻译

Fair Robust Active Learning by Joint Inconsistency

Tsung-Han Wu , Shang-Tse Chen , Winston H. Hsu

分类：机器学习 | 计算机视觉

2022-09-22

公平的积极学习（FAL）利用积极的学习技术来实现有限的数据，并在敏感组之间达到公平性（例如，性别）。但是，FAL尚未解决对抗性攻击对各种安全至关重要的机器学习应用至关重要的影响。观察到这一点，我们介绍了一项新颖的任务，公平的健壮的积极学习（FRAL），整合了常规的FAL和对抗性鲁棒性。弗拉尔（Fral）要求ML模型利用主动学习技术在良性数据上共同实现均衡的绩效，并对群体之间的对抗性攻击进行均衡的鲁棒性。在这项新任务中，以前的FAL方法通常面临无法忍受的计算负担和无效性的问题。因此，我们通过联合不一致（JIN）制定了一种简单而有效的弗拉尔策略。为了有效地找到可以提高弱势组标签的性能和鲁棒性的样品，我们的方法利用了良性和对抗样本以及标准模型和强大模型之间的预测不一致。在不同的数据集和敏感组下进行的广泛实验表明，我们的方法不仅可以在良性样本上实现更公平的性能，而且与现有的活跃学习和FAL基本线相比，在白盒PGD攻击下，我们的方法还获得了更公平的鲁棒性。我们很乐观，弗拉尔将为开发安全，强大的ML研究和应用程序（例如生物识别系统中的面部属性识别）铺平道路。

translated by 谷歌翻译

CFVS: Coarse-to-Fine Visual Servoing for 6-DoF Object-Agnostic Peg-In-Hole Assembly

Bo-Siang Lu , Tung-I Chen , Hsin-Ying Lee , Winston H. Hsu

分类：机器人

2022-09-19

机器人钉孔组件由于其准确性的高度需求而仍然是一项具有挑战性的任务。先前的工作倾向于通过限制最终效果的自由度，或限制目标与初始姿势位置之间的距离来简化问题，从而阻止它们部署在现实世界中。因此，我们提出了一种粗到精细的视觉致毒（CFV）钉孔法，基于3D视觉反馈实现了6DOF最终效应器运动控制。CFV可以通过在细化前进行快速姿势估计来处理任意倾斜角度和较大的初始对齐误差。此外，通过引入置信度图来忽略对象无关的轮廓，CFV可以抵抗噪声，并且可以处理训练数据以外的各种目标。广泛的实验表明，CFV的表现优于最先进的方法，并分别获得100％，91％和82％的平均成功率，分别为3-DOF，4-DOF和6-DOF PEG-IN-IN-HOLE。

translated by 谷歌翻译

Optimization of Mobile Robotic Relay Operation for Minimal Average Wait Time

Winston Hurst , Yasamin Mostofi

分类：机器人

2022-08-23

本文考虑了一个移动机器人的轨迹计划，该机器人在遥远的通信节点对之间持续中继数据。数据在每个源处积聚，机器人必须移动到适当的位置，以使数据卸载到相应的目的地。机器人需要最大程度地减少数据在维修之前在源等待的平均时间。我们有兴趣找到由1）位置组成的最佳机器人路由策略，该位置在该位置停止继电器（继电器位置）和2）确定对配对的序列的条件过渡概率。我们首先将这个问题作为一个非凸面问题，可在中继位置和过渡概率上进行优化。为了找到近似解决方案，我们提出了一种新型算法，该算法交替优化继电器位置和过渡概率。对于前者，我们找到了非凸vex继电器区域的有效凸线分区，然后制定混合校准二阶锥体问题。对于后者，我们通过顺序最小二乘编程找到最佳的过渡概率。我们广泛分析了所提出的方法，并在数学上表征了与机器人的长期能耗和服务速率相关的重要系统属性。最后，通过使用真实的通道参数进行广泛的仿真，我们验证了方法的功效。

translated by 谷歌翻译

Detecting Schizophrenia with 3D Structural Brain MRI Using Deep Learning

Junhao Zhang , Vishwanatha M. Rao , Ye Tian , Yanting Yang , Nicolas Acosta , Zihan Wan , Pin-Yu Lee , Chloe Zhang , Lawrence S. Kegeles , Scott A. Small

分类：计算机视觉

2022-06-26

精神分裂症是一种慢性神经精神疾病，会引起大脑内部的不同结构改变。我们假设将深度学习应用于结构性神经影像学数据集可以检测到与疾病相关的改变，并提高分类和诊断准确性。我们使用单一可用的，常规的T1加权MRI扫描测试了这一假设，我们使用标准后处理方法从中提取了3D全脑结构。然后在三个开放数据集上开发，优化和评估了一个深度学习模型，并对精神分裂症患者进行T1加权MRI扫描。我们提出的模型优于基准模型，该模型还使用3D CNN体系结构对结构MR图像进行了训练。我们的模型几乎能够完美地（ROC曲线下的区域= 0.987），将精神分裂症患者与看不见的结构MRI扫描中的健康对照区分开。区域分析将皮质下区域和心室局部作为最预测的大脑区域。皮层结构在人类的认知，情感和社会功能中起关键作用，这些区域的结构异常与精神分裂症有关。我们的发现证实了精神分裂症与皮质下大脑结构的广泛改变有关，皮层结构信息在诊断分类中提供了突出的特征。总之，这些结果进一步证明了深度学习的潜力，以改善精神分裂症的诊断，并从单个标准的T1加权脑MRI中确定其结构性神经影像学特征。

translated by 谷歌翻译